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This technical report accompanies the manuscript "Conditional Modeling and the Jitter Method of 
Spike Re-sampling," lfT4ll and provides further details, comments, references, and equations that were 
omitted from this main text in the interest of brevity. To ease referencing, the sectioning of the report 
follows that of the main text. 

A few of our remarks in the supplement may be of interest to a broad audience. For quick identifica- 
tion, these high-level remarks are bordered by a left vertical bar, like the one to the left of this paragraph. 
The bulk of this document, however, contains technical details about the various simulations and data 
analyses presented in the main text. These would primarily be of interest to a reader who was hoping to 
reproduce our methods exactly. There is also a self-contained Mathematical Appendix at the end of the 
supplement that provides a more formal treatment of jitter. 

1 Introduction 

Note on terminology 

Some authors prefer alternative terms for jittering, such as "dithering" lfT6l [TTl |2T1 122]| . "teetering" 
|[35l|, or "artificial jitter" [34], etc., presumably to distinguish from another use of the word "jitter" as 
the intrinsic temporal variability in individual spikes. For example, in some situations involving highly 
reliable B4ll or simulated or modeled [21 1 spike trains, individual spikes can unambiguously be placed in 
correspondence with one another across trials. In that case, the temporal variability in a spike's timing, 
under this correspondence, can be quantified directly and is commonly called the "spike jitter" ||441|6T| . 
In this paper, we continue to use "jitter" in its resampling sense, leaving these two uses of the term to be 
disambiguated by context. 

Hypothesis testing and p-values 

The introduction mentions hypothesis testing. Recall that a hypothesis test consists of 

• A null hypothesis, often denoted Hq, which is a collection of hypothetical distributions for the data. 

• An alternative hypothesis, often denoted Ha or Hi, which is another collection of distributions for the 
data, disjoint from Hq. 

• A critical region or rejection region, say C, which is a collection of possible outcomes of the data for 
which we reject Hq whenever the data is in C. 

We evaluate the performance of a hypothesis test by its error probabilities. For a probability distribution 
P ^ Hq, an error (a type I error) is made when the data occurs in the critical region. The probability of a 
type I error is P{C). For P Hq, including P G Ha, an error (a type II error) is made when the data does 
not occur in the critical region. The probability of a type II error is 1 — P{C). The size of a hypothesis 
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test is the maximum error probability for all distributions in Hq, i.e., maxp^Ho P{C)- When designing a 
hypothesis test, one usually tries to keep the size below some prespecified level, say 0.05. The specification 
of Ha is useful for choosing among tests whose size does not exceed this level, with the idea being to 
choose a test that has good power, i.e., low error probabilities, for distributions in Ha- Consequently, Ha 
is important for interpreting a rejection of Hq. Note, however, that only Hq is used for quantifying the size 
of a test. Note also that there may be distributions in neither Hq nor Ha, in which case a rejection of Hq is 
considered the correct decision, even though Ha does not include the data distribution. This is an example 
of model misspecification and it can lead to erroneous scientific conclusions. 

It is common to create a critical region via a test statistic, T, which is just a scalar summary of the data, 
and a critical value, a, so that the critical region is the event C = {T > a} = {x \ T{x) > a} (or some 
similar set, like {T < a}), where x denotes a dataset. A p-value, say p, is a special type of test statistic 
with the property that a critical region of the form C = {p < a} always creates a hypothesis test of level 
a, that is, of size < a, for any < a < 1. In other words, P{p < a) < a for all P G Hq. Given an 
ordinary test statistic T and critical regions of the form C = {T > a}, one can always create a p-value 
\inp{x) = maxpg//p P{{x' ■ T{x') > T{x)}), where x and x' denote datasets. P-values are useful for 
communicating hypothesis tests since each reader can choose his or her desired threshold for maximum 
probability of type I error. We refer the reader to |[95l . for example, for many more details and examples 
about hypothesis testing. 

Poisson processes 

Poisson processes and their generalizations are the most basic types of point processes. Although 
jitter methods are applicable for many non-Poisson processes (which is fortunate, since neural spike 
trains are often poorly approximated by Poisson processes |[Il|8ll)> we use Poisson processes in many 
examples because of their simplicity and familiarity. 

Imagine partitioning time into extremely fine bins of length A. In each bin we (independently) flip a coin 
with probability AA for heads. Heads means that we observe an event in that time bin. Tails means that we 
do not. A (homogeneous) Poisson process is the generalization of this procedure for infinitesimally small 
time bins (that is, holding A fixed and letting A — )• 0). If the value of A is allowed to vary across time 
bins, so that we get a function \{t) for infinitesimally small bins, then this is an inhomogeneous Poisson 
process with intensity (or rate) function \{t). If we first randomly choose the function A(t) itself from some 
collection of possible functions, then this is a Cox process. We refer the reader to [90], for example, for 
details about these and other point process models. 

Trial-to-trial variability 

The introduction also mentions "trial-to-trial variability" and suggests that trial-to-trial variability 
may be as much a result of model misspecification as it is a result of non-stationarity in the data. We 
would like to illustrate this comment with a simple example called "amplitude variability" or "excitability 
variability" [5]. 

Consider repeated independent trials of a point process. Suppose we generate the data on each trial 
by first choosing randomly and independently a firing rate, say 50% of the time the firing rate is 10 Hz 
and 50% of the time the firing rate is 20 Hz. Then we generate the spike train as a homogeneous Poisson 
process with the selected firing rate. On the next trial we randomly choose the rate again from one of the 
two choices, and then generate the spikes. The resulting process is a Cox process, since each trial is a 
conditionally Poisson process with a random rate. 

Is there "trial-to-trial variability" in this example? The answer depends on how we choose to model 
the data. If our model class includes Cox processes, then we can view the trials as independent and 
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identically distributed. There is no non-stationarity, no "trial-to-trial variability". Alternatively, if we 
wanted to model the data as a Poisson process, then the trials cannot be identically distributed, but must 
have different rates on different trials. The data exhibit "trial-to-trial variability" from the point of view of 
our model class, and we would need to introduce trial-specific parameters to allow the Poisson intensity to 
change across trials. In this example, "trial-to-trial variability" is not an intrinsic feature of the distribution 
generating the data, but rather appears as a consequence of our modeling assumptions. 

Of course, not all trial-to-trial variability results from a misspecified model. If the first 5 trials are 
almost always 10 Hz and the next 5 are almost always 20 Hz, then the data cannot be both independent 
and identically distributed (iid) across trials, regardless of the model class. We use the term "trial-to-trial 
variabihty" rather loosely in the main text. Jitter methods are unaffected by coarse-temporal trial-to-trial 
variabihty, regardless of the source. 

2 Spike re-sampling, spike jitter, and conditional inference 

2.1 Trial shuffles and interval jitter 
Permutation tests 

Trial-shuffling corresponds to the statistical concept of a permutation test. Given a sequence of data 
xi, . . . , Xn, a permutation test is a test of the null hypothesis that there is nothing special about the order 
in which we observed xi, . . . ,Xn- Monte Carlo permutation tests are easy to implement: simply randomly 
permute the order of the observed data, and do this many times, and check if the observed data looks unusual 
among the collection of permuted datasets. 

There are a variety of interesting null hypotheses that can be tested with a permutation test. For example, 
suppose that xi, . . . , Xm are iid samples from experimental condition 1 and Xm+i, ■ ■ ■ ,Xn are iid samples 
from experimental condition 2. If there is no difference in conditions, then there is nothing special about the 
ordering of the x's, so a permutation test can be used to test for differences across conditions. 

For another example, suppose that we have iid pairs {xi,yi), . . ., y„) and we want to test whether Xj 
and Ui are independent. If they are independent, then the ordering of the x's does not matter, regardless of the 
ordering of the y's. So a permutation test can be used to test independence. If Xi and yi are simultaneously 
recorded spike trains, then this permutation test is the basis for "shuffle correction" in cross-correlation 
analysis. In the text below we refer to trial shuffling as testing independence, but note that it is also testing 
that the trials are iid. 



Figure 1 

Data generation. On trial k, we first generate ni^k, 
ative function 



• • ■ J /^40,fe iid uniform(0, 1). Then we create the nonneg- 



which satisfies fk{t)dt = 50 for any cr > 0, where denotes the indicator (zero/one) function for the 
event A. Each is the sum of a constant baseline function and 40 different double-exponential probability 
density functions, which are wrapped around the interval [0, 1) to preserve the total integral. For Figure 1 
we use (T = 0.05. Then we generate two independent observations from an inhomogeneous Poisson process 
with intensity function fk{t). These are the two spike trains on trial k, one for each neuron. (Later we create 
a third independent observation; the third spike train is used below in Figure 3.) Each of the 100 trials is 
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created in this manner (beginning with new random ^u's). This is a Cox process. Figure 1 A shows /i , • . . , /s 
and the corresponding spikes on the first 5 trials. Figure IB, gray line, shows J2]^=i /fc(*)/100. 

We concatenated the trials to create a single long spike train for each of the spike trains. (This is not 
important here — we did it for algorithmic reasons — but it does have tiny implications for edge effects at 
the trial boundaries. It does not affect the vaUdity of any of our statistical conclusions. In particular, edge 
effects are not an issue for our interval jitter experiments below, because we placed interval boundaries at 
trial boundaries.) Specifically, we mapped the spike times in trial k from the interval [0, 1) to the interval 
[k — 1, k) by simply adding k — 1 to the spike times. For each neuron, this converts 100 trials of a one 
second spike train into a single observation of a 100 second spike train over the interval [0, 100). Let iVj be 
the total number of spikes from neuron i and let 1^ i < 2 < • • • < ^,Afi be the corresponding observed 
spike times after trial concatenation. To recover the trial-relative times of a spike, we can use Y^^k mod 1. 

PSTHs. Peri-stimulus time histograms (PSTHs) were constructed using 50 ms box smoothing and aver- 
aging across trials. In particular, the PSTH for neuron i at trial-relative time t G (0, 1) was 

1 ^ l{t - 0.025 < {Yi^k mod 1) < t + 0.025} 
100^ 0.050 

k=l 

in units of Hz or spikes/s, where 1{^} is the indicator function of the event A, taking the value 1 if ^4 is true 
and otherwise. PSTHs were plotted using t on a 2 ms grid as the two black lines in Figure IB. 

CCHs and lag-0 synchronies. Cross-correlation histograms (CCHs) were constructed using 2 ms box 
smoothing and considering lags up to ±250 ms. The CCH at lag r was 

^ ^ l{r - .001 < Y2,e - < T + .001} 

k=l 1=1 

in units of total number of spike pairs across all trials. CCHs were plotted for r G (.25, .25) on a 0.4 ms 
grid. The CCH value at r = corresponds to the total number of ±1 ms precise synchronies. Note that 
reversing the role of neuron 1 and 2 merely switches positive lags to negative lags (i.e., reflects the CCH 
horizontally about lag zero) and does not affect the definition of lag-0 synchronies. The original CCH is 
shown in Figure IC (black line). The observed value of c(0) is 575 (almost 6 synchronous pairs per trial). 

Shuffle-surrogate CCHs. A shuffle-surrogate CCH is created by randomly permuting the trial order for 
spike train 1 (i.e., before concatenating the spikes into a single trial), and then computing the CCH as above 
between the trial-shuffled version of spike train 1 and the original version of spike train 2. We create a 
collection of M = 10, 000 such shuffle-surrogate CCHs, each using different independent realizations of 
random permutations of the trials. We will denote the CCHs as cq, ci, cm, where co(r) is the original 
CCH at lag r and Cm{T), for m > 0, is the mth shuffle-surrogate CCH at lag r. 

Lag-0 distribution. The lag-0 distribution (Figure ID) is simply a histogram of the values 
co(0), . . . , cm(0). The black vertical line in Figure ID shows where co(0) occurs within this histogram. 
It is well known that the right tail probability of this histogram is a p-value for testing the independence 
between the two spike trains, namely, 

1 

P-value = — ^{^-(0) ^ 

m=0 

See the discussion of permutation tests above. 

Mean shuffle-surrogate CCH. We also compute the empirical mean CCH after shuffling, that is 

1 ^ 

m=l 



4 



The light gray horizontal curve in the middle of Figure IC shows the mean shuffle-surrogate CCH as a 
function of lag. Note that the average CCH after shuffling does not show the broad peak at zero. The light 
gray vertical bar in the middle of the histogram in Figure ID shows where ^(0) occurs within the lag-0 
histogram. 

Shuffle-derived pointwise acceptance bands. To create the acceptance bands, we first sort the elements 
co(t), . . . , cm(''") to get a sequence C(o)(t) < • • • < C(Af)(r). Note that the indexing still starts at zero 
(corresponding to the minimum), which is not standard for order statistics. Then we set a(r) = C( 025Af) 
and 6(r) = 0( 9757;^). The interval [a(r), 6(r)] contains (at least) 95% of the shuffle-corrected CCH values 
at lag r (including the original). We repeat this separately for each r. On Figure IC the dark gray region 
corresponds to the interval between [a(T), 6(r)] as r varies. With r chosen a priori (say, r = 0), we can 
reject at level 0.05 the null hypothesis that the spike trains are independent whenever co(t) does not land in 
the interval [^(t) , 6(t)] . The dark gray region in Figure ID corresponds to the part of the histogram between 
a(0) and 6(0). 

Controlling for multiple comparisons. If we do not pick r a priori (or if we pick more than one r a 
priori), and we look for some r where co(t) is outside of the pointwise acceptance interval, then we cannot 
reject at level 0.05 and must do something to control for multiple hypothesis tests. Bonferonni is often too 
conservative. Here is something simple that combines all r in order to rigorously test independence that we 
have found works well in many situations and provides a nice visual display. First, robustly standardize the 
collection of CCHs at each r and then compute the maximum and minimum of each standardized CCH, that 
is, compute 



M-l 

i 



^ M-l 
— j- Y Hm)iT') 



M _ 

m=l 



^ M-l 

3^ Y (cm('^) ~ ^('^)) 



M -2 



m=l 



s[t) t , 

(Notice that v{t) and s(r) are computed without the extreme values, and are therefore robust to single 
outliers in either tail. Notice also that they are based on the sorted values, and so may contain the original 
CCH.) Now order the maximums and minimums, i.e., c^^ < • • • < and c^^ < • • • < c^^^jy If 

Cq < c^Q25A/) ^ ^^(0 9751/)' ^^'^'^ '^^^ reject the null hypothesis of independence at level 0.05. 

Although it takes a while to wade through all of the preprocessing, it is straightforward to verify that the 
maximum (or minimum) statistics (which combine all time lags) can be shuffled without changing their 
distribution under the null hypothesis of independence. The reader is referred to ||99l [97l for more details 
about resampling based multiple testing. 

We note that independence is rejected for the data in Figure 1 (as it should be — the spike trains are 
correlated on coarse timescales by virtue of sharing the same random intensity function) using this particular 
method of controlling for multiple comparisons. 

Shuffle-derived simultaneous acceptance bands. The multiple comparisons method described above can 
be used to create simultaneous acceptance bands, which provide a nice visual display to augment pointwise 
bands. They also show which time bin offsets were the most unusual from a multiple comparisons per- 
spective. Define a*(r) = 025Af) + ^*(''") — '^(0 975A/)^(^''") ^(''")- straightforward 
to verify that we reject the null hypothesis using the multiple comparisons method described above exactly 
when there exists a r for which co(t) is not in the interval [a*(r), 6*(r)]. On Figure IC the light gray region 
corresponds to the interval between [a*(T), 6*(r)] as r varies. The CCH exceeds the simultaneous bands, 
signaling a rejection of the null hypothesis. The light gray region in Figure ID shows the points between 
a*(0) and 6*(0) for easy comparison to Figure IC, but note that the entire collection of CCHs, not just the 
lag-0 values, are used in the construction of simultaneous acceptance bands. 
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Shuffle-corrected CCHs. For visualization purposes, we show the shuffle-corrected CCH as a black line 
in Figure IE. The shuffle-corrected CCH is the original CCH minus the shuffle mean, i.e., co(r) — /i(r). 
Under the null hypothesis of independence, the shuffle-corrected CCH has expected value zero for all r, so 
"large" variations from zero (sohd light gray line in Figure IE) are evidence against the null. The acceptance 
bands described above are one way to quantify "large". On Figure IE we also show the shuffle-corrected 
version of these bands, namely, the dark gray region corresponds to the interval between [a(r) — /i(T) , 6(r) — 
/x(t)] as r varies and the light gray region corresponds to the interval between [a*(r)— /i(r), 6*(r)— /i(r)] as 
r varies. Note that the shuffle-corrected CCH exceeds the shuffle-corrected bands exactly when the original 
CCH exceeds the original bands. The "correction" is only for visualization purposes. 

Interval jitter 

Additional remarks. As with trial shuffling, interval jitter is a procedure for generating an ensemble 
of spike processes from a single observed process. The difference is that spikes are jittered within a 
priori defined windows ("intervals") rather than shuffled across trials. Thus for a jitter window of 5 ms, 
the recording is first partitioned into successive 5 ms intervals. An ensemble of spike processes is then 
generated by relocating each spike, independently, to a random point within its original 5 ms interval. 
Everything else is the same: a statistic, such as the number of one-millisecond synchronies, is chosen, 
and the observed value from the original, unperturbed, spike train is compared against the ensemble of 
values collected through the jitter process. If many statistics are available, such as all of the lags in a CCH, 
then pointwise and simultaneous acceptance bands, and "jitter-corrected" displays can be constructed, as 
well. 

Figure 2 

Interval jitter The mth interval-jitter surrogate for neuron i using jitter windows of length 5 > is 

• ■ ■ , ^.S) = ^on{5\_Y,,,/5\ +5U^\... ,5\Y,,nJ5\ + 6U^:^J 

where U-^^ are iid uniform(0, 1) random variables for all i,k,m, and where [-J is the floor function (greatest 
integer less than). In words, S[Yi^k/S\ moves the fcth spike of neuron i to the start of its (5-length jitter 
window. Then we add SU^^^ to get a uniformly chosen location within that same jitter window. We do this 
independently for every spike of each spike train. Finally, we sort the results of each spike train so that the 
spike times are increasing. We used 6 = 0.02 and repeated this M = 10, 000 times, i.e., m = 1, . . . , 10, 000, 
to create 10, 000 surrogate jittered datasets. 

Replacing trial-shuffling with interval jitter For each of the M = 10, 000 surrogate jittered datasets we 
computed the CCH between spike train 1 and 2, giving M interval-jitter surrogate CCHs, say, ci, . . . , cm, 
using the same notation for the shuffle surrogate CCHs described above. As before, it is convenient to let 
Co denote the CCH of the original (unjittered) dataset. Now everything proceeds exactly as described above 
for the collection of shuffled-derived CCHs, except we use the collection of jitter-derived CCHs. The lag-0 
values, namely, co(0), ci(0), . . . , cm{^), are still the ±1 ms synchronies, but now the surrogates correspond 
to the number of synchronies after jittering (instead of after shuffling). P- values are computed the same way 
and acceptance bands are computed the same way. The corresponding hypothesis test is described below. 
For jitter-correction, we subtract the mean, say ii{t), of the jitter-surrogate CCHs from the original CCH 
and any acceptance bands for easy visualization. 

2.2 Hypothesis Testing and Conditional Inference 



6 



Interval jitter null hypothesis. Fix the jitter window length 5. Define 

Si{k) = #{spikes from neuron i in time interval [5{k — l),5k)^ 

where time refers to absolute time, not trial-relative time. For the data in Figure 1 with 5 = 0.02, for 
example, 5*1 and S2 are each length 5000 nonnegative integer-valued vectors. The interval jitter null 
hypothesis is 

Hq. The conditional distribution of the data is uniform given the vectors Si for all i. 

Notice that Hq depends on 5 via the definition of the 5"s. Hq makes no assumption about the distribution 
of the S"s, which are coarse-temporal statistics of the spike trains. Therefore, Hq says nothing about the 
comparison of two datasets with different interval counts. On the other hand, if two datasets have exactly 
identical S"s, even if the precise timings of spikes are completely different, then Hq states that the two 
datasets are equally likely to have occurred. 

If (and only if) the S"s are independent Poisson random variables, then the null hypothesis states 
that the spike trains are independent inhomogeneous Poisson processes with piecewise constant intensity 
functions that are constant within jitter windows. This is but one of infinitely many classes of point pro- 
cesses that are included in Hq. For example, the null hypothesis also includes doubly-stochastic Poisson 
processes (Cox processes) whose random inhomogeneous intensity function is piecewise-constant over 
the jitter intervals, in which case the 5's are neither independent nor Poisson. 

The version of interval jitter used here places jitter window boundaries at the points 
. . . , —35, —25, —5, 0, 5, 25. 35, ... . There is nothing special about using these particular jitter window 
boundaries. Indeed, there is no reason that they need all be the same length. The key for creating a valid 
hypothesis test is that the jitter window boundaries do not depend on the data and that the boundaries are 
in the same locations for all jitter surrogates of the same spike train. Of course, the boundaries influence 
the interpretation of the resulting hypothesis test, since they create the null hypothesis. 

There are many variations on this interval jitter theme. A common one is to jitter only neuron I (say) and 
leave all of the other neurons fixed. This tests hypotheses about the temporal resolution of neuron 1 with 
respect to the others, without imposing assumptions about the temporal resolution of the others. The null 
hypothesis would be 

Hq: The conditional distribution of the data is uniform given the vector S\, and given the values of all 
spike times other than those from neuron 1 . 

Since we are conditioning on more aspects of the data, this single neuron version of interval jitter is an even 
larger null hypothesis than the original. Consequently, tests will tend to be more conservative (harder to 
reject), but a rejection here imphes a rejection of the original null. For computational reasons, we use the 
single neuron version of interval jitter below when we discuss relaxing the uniformity assumption in the null 
hypothesis. 

Conditional inference 

Conditional versus unconditional inference. Jitter and many of the related resampling techniques 
currently in use in neuroscience are best interpreted as Monte Carlo conditional inference, or perhaps, 
Monte Carlo approximate conditional inference. We have noted a systematic tendency to blur the distinc- 
tion between conditional inference and unconditional inference with regard to these jitter-like resampling 
techniques. Unconditional bootstrap, in particular, seems to be a common (mis)interpretation, perhaps 
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because bootstrap computations are strongly associated with Monte Carlo methods. Mistaking a condi- 
tional procedure for an unconditional one can lead to statistical and scientific errors. This seems espe- 
cially the case for the types of neuroscience examples that we have in mind. In this section we provide a 
simple simulation example to illustrate the differences between conditional and unconditional inference. 

Consider 100 trials, each lasting 1 second, of two simultaneously recorded spike trains. We are 
interested in precise zero-lag correlations and want to understand the distribution of the statistic T = c(0), 
defined above as the total number of ±1 ms synchronous spike pairs. For example, we may want to 
understand the distribution of T in order to construct accurate critical values for hypothesis testing. 

For unconditional inference we want to understand the unconditional distribution of T. For example, 
suppose that the neurons are independent homogeneous Poisson processes with rates 50 Hz (neuron 1) 
and 25 Hz (neuron 2), respectively, and that trials are iid. We generated 10^ datasets from this distribution, 
computed the value of T on each one, and displayed the resulting empirical distribution in Supplementary 
Figure S[T]Panel A. This is an excellent approximation of the true unconditional distribution of T because 
we used so many Monte Carlo samples (from the true distribution). If we did not know that the neurons 
were 50 Hz and 25 Hz homogeneous Poisson processes, but were given some data, we might try to use 
bootstrap to approximate this unconditional distribution of T. 

For conditional inference we want to understand the conditional distribution of T given some other 
event. For interval jitter, this other event is the observed sequence of spike counts in all of the jitter win- 
dows for all neurons. Supplementary Figure S[T]Panels B-E show four different examples of conditional 
distributions of T given different observed sequences of spike counts in 5 ms jitter intervals. Again, each 
of these is actually an empirical distribution using 10^ Monte Carlo samples from the respective condi- 
tional distributions. Sampling from the true conditional distribution is easy because the spike times are 
conditionally uniform given the spike counts. For any given dataset, we will have a specific sequence of 
spike counts, which gives a specific conditional distribution of T, and this (and only this) is the distribu- 
tion we want to understand. 

Notice that the conditional distributions can be quite different from the unconditional distribution. 
Notice also that these conditional distributions are what interval jitter generates. Finally, notice that a 
great many distributions (other than 50 Hz and 25 Hz independent homogeneous Poisson processes) give 
rise to the same conditional distributions of T. Most of these processes will have markedly different 
unconditional distributions of T. 

2.3 Synchrony as a Test Statistic 

Although any test statistic can be used in conjunction with jitter, we note that the choice of test statistic 
strongly affects both the statistical power of jitter-based hypothesis tests and the interpretation of any scien- 
tific conclusions drawn from jitter about the alternative hypothesis. 

2.4 Basic jitter 

Basic jitter. Basic jitter is a heuristic procedure that jitters spikes in windows centered at the original spikes. 
For basic jitter, the mth surrogate of spike train i is 

• • • , = + 6{ulf - 1/2), . . . , y,^. + 5{ui-l - 1/2)) 

Basic jitter does not correspond to Monte Carlo sampling from any conditional distribution. If it did, then 
the conditioning event would have to specify (either explicitly or implicitly) the center of the jitter windows, 
which means it would specify the locations of the original spikes, which means the resulting conditional 
distribution would have no variability (it would be a point mass on the locations of the original spikes). This 
is true for any type of spike-centered jitter, regardless of the distribution used to jitter spikes (here, uniform. 
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Figure SI: Unconditional versus condi- 
tional distributions. Panel A shows the dis- 
tribution of the total number of 1 ms syn- 
chronous pairs in 100 iid one-second tri- 
als between two independent 50 Hz and 25 
Hz homogeneous Poisson processes. This 
distribution was estimated using 10^ Monte 
Carlo observations. The panels B-E corre- 
spond to four different observations, respec- 
tively, among the lO*^ observations used to 
create panel A. In each case, panels B-E 
show the conditional distribution of the total 
number of 1 ms synchronous pairs given the 
sequence of spike counts in 5 ms windows. 
Each of these conditional distributions was 
approximated with 10'' uniform interval jit- 
ter Monte Carlo surrogates. Since none of 
four exemplars have the identical sequence 
of spike counts, the conditional distributions 
are different. All of the 5 graphs are on the 
same scale. The observed synchrony counts 
for the original data from panels B-E were 
225, 253, 274, and 235, respectively. 
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but other authors have used Gaussian or triangular jitter distributions centered at the original spikes). No 
such procedure should be interpreted as a statistical hypothesis test. 

2.5 Accidental synchrony 
Figure 3 

Injected synchrony. Locations for injected pairs were produced by sampling from the same inhomogeneous 
Poisson processes used in the original experiments depicted in Figure 1, but with the intensity functions 
scaled down from 50 Hz to 0.25, 0.50, and 0.75 Hz, respectively for the three additional experiments, 
corresponding to a mean total number of injected synchronous pairs of 25, 50, and 75 over the 100 trials. 
In order to maintain identical marginal (single-neuron) statistics following the addition of the synchronous 
pairs, each Poisson spike process was first thinned by subjecting every spike, independently, to random 
ehmination. The actual probability of elimination was, for example, 0.005 for the experiment with 0.25 Hz 
injected synchrony, and more generally was ^ / 50 for the experiment with synchronies injected at Hz. 

Here are the exact details. We refer to the data generation description above for Figure 1 . We begin with 
the identical dataset used in Figures 1 and 2. We also create a third independent spike train from the same 
distribution, i.e., with intensity function fk{t) on trial k. Let denote the jth trial-relative spike time 
on the kth trial of neuron i, where i = 1,2, 3, and A; = 1, . . . , 100, and j = 1,. . . , N^^k, where iVj ^ is the 
number of spikes from neuron i on trial k. The spike times (the Z's) for these three spike trains are fixed 
for each of the different injected synchrony experiments. We will use them to build datasets with differing 
amounts of synchrony. 

Now we generate iid uniform(0, 1) random variables one for each spike time These U's 

are also fixed for all of the experiments. To generate a dataset with < /i < 50 Hz injected synchrony, we 
create two new spike trains as follows: 

• The spike times in the new spike train 1 are those ^ij,fe with Uij^k < 1 — h/50 and also those Z^j^k 
with Usj^k < h/50. 

• The spike times in the new spike train 2 are those Z2j^k with [/2,j,fe < 1 — ^/50 and also those Z^j^k 
with Usj^k < h/50. 

Notice that the two new spike trains share the selected spikes from the third old spike train. It turns out that 
each of the two new spike trains has the same distribution, individually, as those from the dataset in Figure 1 . 
(This is a result of the fact that thinned Poisson processes and superimposed independent Poisson processes 
are both still Poisson processes.) When h = 0, it is the same dataset and the two new spike trains are, of 
course, still independent. But when h > the two spike trains are no longer independent. They share some 
spikes. We use h = 0.0, 0.25, 0.50, 0.75 in the experiments for Figure 3. All other details are identical to 
the experiments in Figures 1 and 2. 

The rationale for this method of creating an injected synchrony dataset is that the datasets for different 
amounts of injected synchrony are as similar as possible, and thus easier to compare. The figures can, 
however, be misleading because (if they are interpreted as independent datasets) they give the impression 
of much smaller variability than would be actually observed across different experimental datasets with 
similar distributions. 

Crude estimate of injected synchrony. The estimate of injected synchrony mentioned in Section 3.1 
of the main text and in the Figure 3 legend is the height of the lag-0 peak in the jitter corrected CCH, 
namely, co(0) — /x(0), as discussed above in the Supplementary discussion of Figure 2. We note that this 
estimate has substantial bias that can be improved by taking seriously the model of injected synchrony. A 
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small p- value should always be accompanied by some scientifically interpretable measure of the degree 
of departure from the null hypothesis. This crude estimate serves just such a purpose for synchrony. It 
is in units of total number of coincident spike pairs. Dividing by time gives an estimate of the injected 
synchrony rate. An extremely small injected synchrony rate is unlikely to be scientifically interesting, 
regardless of how small the p- value is. 

For convenience, the injected spike times are identical and could therefore be uniquely identified in the 
data. Our test statistic, however does not take advantage of this. The test statistic (millisecond-accurate 
synchrony count) detects 575 synchronous pairs even without any injected synchronies. Indeed, the sim- 
ulations and estimates are essentially unchanged if the injected spikes are not identical, but are offset by 
some random amount on the order of 1 ms. This would be more realistic, but adds an additional layer of 
compUcation in the simulations, especially if we want to ensure that the marginal (i.e., single spike train) 
distributions are truly identical across differing amounts of injected synchrony. 

2.6 Temporal resolution 
Figure 4 

Data distribution. We first generated fii, . . . , /i4o iid uniform(0, 1). The sorted values were 0.032, 0.034, 
0.036, 0.046, 0.097, 0.098, 0.127, 0.142, 0.158, 0.171, 0.277, 0.278, 0.317, 0.392, 0.422, 0.485, 0.547, 
0.632, 0.655, 0.656, 0.679, 0.695, 0.706, 0.743, 0.758, 0.792, 0.800, 0.815, 0.823, 0.849, 0.906, 0.913, 
0.916, 0.934, 0.950, 0.957, 0.958, 0.959, 0.965, 0.971. (These are the same /i's used on the first trial in the 
experiments in Figures 1-3.) For the experiments here, these 40 values are fixed for all trials and all data 
sets. 

For a fixed bandwidth a, the 100 trials of each neuron are independent and they are each iid samples 
from an inhomogeneous Poisson process with intensity function 

\ J=l i=-oo ' " ^ ' " 'I 

This is the same as Figure 1 , but here it is fixed for all trials (and the bandwidth is different for different 
experiments). Recall that f{t)dt = 50 for all cr > 0. As cr gets smaller, f{t) becomes more concentrated 
around the /x^'s. It approaches a constant of 50 as a gets very large, and it approaches a constant baseline 
of 10 with additional narrow spikes at each as a gets very small. Our simulations do not venture to these 
extreme bandwidths, however. 

Figure 4 uses bandwidths of cr = 0.036, 0.020, 0.012, 0.008, from top to bottom, respectively. 

Data generation. Much like Figure 3, the motivation for this sampUng procedure is to make simulated 
datasets of different bandwidths be as similar as possible. Each spike train is a superposition of many 
independent Poisson processes that combine together to give the final intensity function / described above. 
We will describe the process used to generate spike train 1 for each bandwidth. The generation of spike train 
2 proceeds identically and independently. 

The first piece corresponds to the 10 Hz baseline rate. We generated 100 iid samples from a 10 Hz 
homogeneous Poisson process over (0, 1). Each dataset (with different bandwidth) shares these spike times. 
Then for each trial (fc = 1, . . . , 100) and each /ij (j = I, . . . , 40) we generated iid Poisson random variables 
(not processes) with mean 1. Call these random variables Mkj. These 4000 numbers are fixed and shared 
across all datasets. For each a, k, j we generated M^j iid samples from the probability density function 
(pdf) 
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The point is that the number of spikes Mkj contributed by a given component j = 1, 2, . . . , 40 to a given 
trial /c = 1, 2, . . . , 100 is the same for all band widths a. We did this by sampling from the double exponential 
distribution with pdf exp(|t — ^j|/((j/\/2))/(2(T/\/2) and then taking the samples modulo 1 (which wraps 
samples outside of the unit interval back onto it). For bandwidth a and trial k the sorted collection of these 
samples and the original 10 Hz baseline spike times become the spike times on trial k for neuron 1 in the 
bandwidth a dataset. 

Data processing. The data processing is identical to that in Figures 1-3. 

2.7 The jitter-corrected cross-correlation histogram 

Jitter-correction is described in the Figure 2 section above. It is exactly analogous to shuffle-correction, ex- 
cept that it uses jitter, instead of trial-shuffling, to create the surrogates. We note that the true expected value 
CCH after jittering can be computed exactly without using the empirical mean of jittered surrogates. We 
do not describe the details here, since explicit creation of surrogates seems necessary to construct pointwise 
and simultaneous acceptance bands. 

3 Variations on the jitter theme 

3.1 Rate of change of intensity functions 

Recall that if we consider only inhomogeneous Poisson processes, then interval jitter tests whether the 
time- varying intensity function is piecewise constant over jitter windows. Sticking to the inhomogeneous 
Poisson model for motivation, a more realistic and interesting hypothesis about the time- varying intensity 
function would be that it is (essentially) piecewise linear, with a given bound on the percent change 
that can occur within any one jitter interval. Supplementary Figure S|2] reproduces the first intensity 
function from the experiment discussed in Section 2.1, with a piecewise-constant approximation over 20 
ms jitter intervals (panel A), and a piecewise-linear approximation, also over 20 ms intervals, but with 
the additional constraint that firing rate not change by more than 25% within any one interval (panel B). 
Processes generated from the piecewise linear intensity function would be, for all intents and purposes, 
indistinguishable from processes generated from the original intensity function. 

We provide some details below about how to modify interval jitter to allow for non-uniform jitter 
distributions, such as piecewise linear up to a certain amount of change as suggested above. As with 
interval jitter, the null hypothesis is much more general than the set of Poisson processes with piecewise- 
linear intensity function. The null hypothesis is in terms of the conditional distribution on spike locations, 
given the numbers of spikes observed in the 6 ms intervals. Conditioned on these numbers, the placements 
of spikes are assumed independent, with likelihoods that vary linearly within an interval and have a 
specified bound on change over the interval. (Uniform relocation is the special case in which the percent 
change within a jitter interval is bounded by zero.) The hypothesis can be tested in either member of 
a pair of recorded neurons. Spikes in the selected neuron are randomly relocated; the other neuron's 
spikes remain fixed. Synchrony is defined with respect to the second ("reference") neuron: a spike in 
the selected neuron is synchronous if it falls within one ms of any of the (fixed) spikes in the reference 
neuron. As with interval jitter, the observed synchrony count is compared to the population of counts 
generated by the randomization process. 

For each jitter interval of the selected neuron, the linear intensity function that maximally promotes 
synchronies is computed. These "worst-case" intensity functions define the spike-relocation distribution 
("tilted jitter"). Formulas for computing worst-case intensities under the piecewise-linear hypothesis, and 
for various generalizations, are provided in Section B of the Mathematical Appendix. The point is that 
the resulting right-tail probabilities on synchrony counts are at least as big as the tail probabilities for any 
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Poisson intensity function 




1 second 



Figure S2: Piecewise approximations of a rate 
function. The rate function shown in the upper-left 
panel of Figure 1 A is approximated by a piecewise 
constant (panel A) and piecewise linear (panel B) 
function, with respect to a fixed 20 ms partition- 
ing ("jitter intervals"). The linear approximation is 
constrained to change by no more than 25% within 
any 20 ms interval. Subject to this constraint, the 
piecewise linear function shown in B best approxi- 
mates the original. Spike processes generated from 
the linear approximation would be nearly indistin- 
guishable from processes generated from the origi- 
nal rate function. 



other local intensity functions that are consistent with the null hypothesis. Therefore, a test performed 
at a given level of significance under the single conditional hypothesis defined by these particular local 
intensity functions is simultaneously valid for any other hypothesis in the compound null. In the search 
for evidence for fine-temporal structure it is conservative to resample spikes using these worst-case prob- 
abilities. 

Non-uniform interval jitter null hypothesis. There are many ways to relax uniformity in interval jitter. For- 
mally, we simply replace the word "uniform" in the specification of the null hypothesis with the description 
of another class of distributions that makes sense. For example, we might replace "uniform" with "a dis- 
tribution that has at most 0.01 total variation distance from uniform". In practice, however, replacing the 
uniform distribution with some other class of distributions can create computational challenges. The Math- 
ematical Appendix discusses this in more detail. Here we simply provide the details for the comments about 
non-uniform jitter in the main text and elsewhere in this supplement. 

We experiment with replacing uniform with linear (see Proposition A.6 and Section B of the Mathemat- 
ical Appendix). For computational reasons, we fix the spike times of neuron 2 and only jitter the spikes of 
neuron 1. In each jitter window (of each trial for neuron I), given that there are s spikes in that window, the 
null hypothesis dictates that these spike times are (conditionally) independent from the spike times in other 
windows and that the density on these s spike times is iid with common pdf 

fe{x) = \{l + e[^^-^^^\{a<x<a + 5] for some \0\ < ^ 

where the jitter window is [a,a + 5) and where e > is a bound on the allowable fraction of change in firing 
rate over any jitter interval. The parameter 9 is allowed to be different in each jitter window, but it cannot 
exceed the specified bound. The strange looking form of the bound on\6\ comes from the fact that any such 
^has 

max — 1 < e 

x,y(i[a,a+5] feKV) 

The lefthand side (x 100%) is the maximum percentage change in firing rate over the jitter interval [a, a + 
5]. The null hypothesis constrains this to be less than e (x 100%). It is easy to verify that fe{x) > 
and integrates to one on [a, a + 5). The null hypothesis has two parameters that control the scientific 
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interpretation: the jitter window length (S) and the maximum fractional change in firing rate over any jitter 
window (e). The case e = corresponds exactly to the original uniform null. As e increases, the null 
hypothesis enlarges to allow increasingly non-uniform processes. In Section 3. 1 of the main text, we mention 
allowing for various amounts of percentage change in firing rates within a jitter window. These percentages 
refer to e x 100%. 

Also for computational reasons, we modify the synchrony test statistic for non-uniform jitter. It is now 
(see the supplementary discussion for Figure 2) 

i=l 

which is the number of spikes in neuron 1 that participate in a synchronous pair (instead of the total number 
of synchronous pairs — these two measures tend to be quite similar). For this test statistic, it is computa- 
tionally straightforward to choose the 9 in each jitter window that creates the most amount of synchrony 
while satisfying the chosen e bound (see Section B of the Mathematical Appendix). Then we can jitter 
using the corresponding fg in each jitter window (instead of uniform jitter). The resulting jitter distribu- 
tion of synchrony is maximally conservative, meaning that we get a valid p- value for the non-uniform null 
hypothesis. 



3.2 Re-sampling patterns 

Patterns. Fix a parameter R>0. Consider a spike train !!<•••< Y^. Define Yq = — oo and Iat+i = oo 
for notational convenience. We can uniquely partition the spike train into patterns as follows: Yj, ... ,Yk is 
a pattern if and only if 

Yj - Yj_i > R and 

Yk+i-Yk>R and (1) 
li+i —Yi<R whenever j < i < k 

where l<j<k<N,m which case Yj is the starting time of the pattern. With a slight abuse of 
terminology, we will say that two patterns are the same if they have the same number of spikes and identical 
sequences of interspike intervals. Two spike trains Yi < ■ ■ ■ < Ypf and Y( < ■ ■ ■ < Y^, have the same 
sequence of patterns if and only if 

N = N' and 

y^+i - = yZ+i - y/ whenever y+i - y < i?, and (2) 
i^'+i -Yi> R whenever y^+i -Yi> R 

where 1 < i < A''. Notice that Y and Y' have the same patterns if and only if each spike in Y has the same 
length R history as the corresponding spike in Y' , where the length R history of a spike Yi is the (perhaps 
empty) list of all spike times relative to Yi that occur in the i2-length interval immediately preceding Yi. If 
there are 3 spikes in this interval, then the history would be (1^-3 — Yi, Yi-2 — Y^, YJ-i — Yi). 

The pattern-encoding statistic. If y^ i < • • • < Yi^p^,. is the spike train from neuron i, it is straightforward 
to verify that the following Ni x 2 matrix only encodes the sequence of patterns and the sequence of length 
S interval jitter windows that contain the start of each pattern: 



Vi{i, 1) = l{Yi,e - Yi,e-i > R} and Viii, 2) = 



[Yi^e/S\ ifVi{i,l) = l 
Yi,e-Yi,i-i ifVi{i,l) = 



for £ = 1, . . . ,Ni, where we define y^o = — oo for notational convenience. The first column of Vi indicates, 
with a one as opposed to a zero, which spikes begin a pattern. The second column of Vi gives the jitter 
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window for spikes that begin a pattern and gives the preceding interspike interval for spikes that are within 
a pattern. 

The pattern jitter null. Fixing the history parameter R and the jitter window length 5, the pattern jitter 
null is 

Hq: The conditional distribution of the data is uniform given the matrices Vi for all i. 

Monte Carlo pattern jitter A fast algorithm for sampling from the pattern jitter null distribution is 
described in [W]. We use the discretized version of the algorithm below, which constrains all spike times to 
a fine grid. In many real applications, spike times are recorded at some fixed precision, and jitter surrogates 
should also be constrained to this same precision. Monte Carlo pattern jitter surrogates can be used just 
like trial-shuffled surrogates or interval jitter surrogates to create p-values, acceptance bands, and visual 
"corrections" of CCHs or other graphical displays. 

Figure 5 

The observed spikes (top row of each subfigure) are the spike times from neuron Nl (see Section 4) recorded 
between 3 and 4 seconds after the beginning of the experiment (a few seconds before the beginning of the 
first trial). They were selected for illustration purposes. The spikes were discretized at l/30th ms (30000 
bins per second) and the discretized version of the pattern jitter sampling algorithm was used to create the 
example surrogates for different combinations of 5 and R. 

Pattern jitter for bursting neurons 

Here we experiment with a modification of the data sets used in Figure 3 (see Section 2.5) to illustrate the 
utility of pattern jitter for non-Poisson neurons. In particular, we do a cross-correlation analysis of strongly 
bursting (artificial) neurons and show that pattern jitter is important in this context for preserving the nominal 
level of hypothesis tests. This is interesting because the expected number of synchronies should be largely 
unaffected by the auto-coiTclation structure of individual neurons. Nevertheless, the distribution is affected 
and so the auto-correlation must be accounted for when designing proper hypothesis tests. 

We begin with the identical data set used for Figures 1 and 2. The notation that we use here is described 
above in Section 2.5 for Figure 3. Define di^k = L^j,fc/3j to be the number of spikes in trial k for spike train 
i divided by 3 and rounded down to the nearest integer. For each f = 1, 2 and k = 1, . . . , 100, we remove 
2di^k spikes uniformly at random from that trial and that neuron. Then we choose di^k of the remaining 
spikes (again, chosen uniformly at random; this will be all of the remaining spikes if dj ^ is a multiple of 3) 
and make them bursts of 3 spikes: we leave the original spike; we add a new spike uniformly in the interval 
(8, 9) ms after the original spike; we add another new spike uniformly in the interval (16, 17) ms after the 
original spike. In the event that one of the new spikes lands outside of the trial interval, we repeat the entire 
process for that trial of that neuron. This procedure takes the original data and makes the neurons burst (with 
high probability) in a succession of three spikes separated by about 8 ms per spike, while leaving the total 
number of spikes on each trial the same, and while leaving the time-varying, trial-varying intensity of the 
process largely unchanged. These are highly non-Poisson spike trains. 

Beginning with this as the baseline dataset for spike trains 1 and 2, and using the same spike train 3, 
we repeat the thinning/injection procedure of Figure 3 to create datasets with varying degrees of zero-lag 
injected synchrony. Then we repeat the analysis of Figure 3, both with 5 = 20 ms interval jitter (Figure 
[3]\) and with 5 = 20 ms and i? = 10 ms pattern jitter (Figure |3^). The simultaneous acceptance bands 
are violated with zero injected synchrony using interval jitter. This is a correct rejection — the bursting is 
a type of fine-temporal structure — but it is not what we normally want with a cross-correlation analysis. 
Implicit in a cross-correlation analysis is the understanding that the resulting conclusions are about the 
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A 20 ms interval jitter correction 
cross-correlation histogram lag-0 distribution 



B 20 ms pattern jitter (R=1 ms) correction 
cross-correlation histogram lag-0 distribution 
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Figure S3: Interval versus pattern jitter corrected CCHs for bursting neurons. Compare with Figure 3 in the 
main text. The data set for this figure is described in the Supplementary section Pattern jitter for bursting neurons. 
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statistical relationship between two neurons. We wanted to see synchrony (or a lack thereof) without artifacts 
introduced by the auto-correlation structure of the spike trains. Pattern jitter, by preserving the bursts in the 
resamples, correctly calibrates the acceptance bands and does not report a rejection for the zero injected case. 
Again, this is a correct conclusion for pattern jitter, because pattern jitter is a different null hypothesis that 
allows for certain types of auto-correlation structure (including the types of bursting introduced here). Unlike 
interval jitter, however, the correct conclusion from pattern jitter corresponds with the standard scientific 
interpretation of significant coiTclation. 

As we add injected synchronies, pattern jitter behaves similarly to interval jitter from Figure 3 and begins 
to flag as significant the injected zero-lag synchronies (as it should — injected synchronies are not in the 
pattern jitter null hypothesis). Statistically speaking, pattern jitter has similar power as interval jitter for 
detecting injected synchronies in this case. 

There are, of course, much more striking examples for illustrating the importance of pattern jitter, but 
the role of pattern jitter in these examples tends to be so obvious that detailed simulations are superfluous. 
In particular, if the test-statistic depends highly on the auto-correlation structure of individual spike trains, 
such as the repeated spiking motifs of Section 4. 1 , and if the short-range auto-correlation structure is not of 
scientific interest, such as refractory periods and bursting, then pattern jitter, or a similar null hypothesis, is 
obviously important in order to focus inference on the features of interest, while ignoring the short-range 
auto-correlations of non-interest. See lH for examples. 

4 Jitter analysis of three motor-cortical neurons 

Neurophysiological methods 

Three neurons (designated Nl, N2, and N3) were selected from simultaneous multi-neuronal recordings 
made in primary motor cortex (MI) of the monkey. Single-unit activity was recorded from a multi-electrode 
array composed of 100 electrodes (1.0 mm electrode length; 400 fim inter-electrode separation) that was 
chronically implanted in the arm area of MI of one macaque monkey (Macaca Mulatto) — see [92] for 
details. During a recording session, signals from up to 96 electrodes were amplified (gain, 5000), band- 
pass filtered between 0.3 Hz and 7.5 kHz, and recorded digitally (14-bit) at 30 kHz per channel using a 
Cerebus acquisition system (Cyberkinetics Neurotechnology Systems, Inc., MA). Only waveforms (1.6 ms 
in duration; 48 sample time points per waveform) that crossed a threshold were stored and spike-sorted using 
Offline Sorter (Plexon, Inc., Dallas, TX). The monkey was operantly trained to move a cursor appearing 
above the monkey's hand location to targets projected onto a horizontal, reflective surface in front of the 
monkey. At any one time, a single target appeared at a random location in the workspace, and the monkey 
was required to move to it. As soon as the cursor reached the target, the target disappeared and a new target 
appeared in a new, random location. All of the surgical and behavioral procedures were approved by the 
University of Chicagos lACUC and conform to the principles outlined in the Guide for the Care and Use of 
Laboratory Animals (NIH publication no. 86-23, revised 1985). 

Figure 6 

Spike times from the first second of each successful trial were selected. This results in 3 simultaneously 
recorded spike trains, each with 391 one-second trials. Spike times in this data set are discretized to l/30th 
ms. PSTHs and CCHs are computed identically as in Figure 1 (except with 391 trials instead of 100 trials). 
Trial shuffling and the resulting data processing are also identical to Figure 1 . 

Interval jitter. For interval jitter (middle column) we partitioned each trial (i.e., the first second of each 
trial) into 50 jitter windows of length 20 ms each. (This implies that our jitter analysis is also conditional 
on the starting times of successfully completed trials, since we used this information to select the data.) 
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Each 20 ms jitter window has 600 time bins of l/30th ms each. We jittered each spike by uniformly 
and independently choosing one of the 600 bins without replacement in its respective jitter window. (The 
modeUng assumption here is that a time bin cannot have multiple spikes, which is realistic for such small 
time bins. If the observed data has no time bins with multiple spikes, then sampling without replacement 
agrees with i? = pattern jitter.) 

Pattern jitter. For pattern jitter (right column) we proceeded as in interval jitter, but used Monte Carlo 
samples from the i? = 0.1 seconds pattern jitter null. Spike times outside of the first second of each trial 
were ignored. (In practice, a more careful procedure would be to condition on these ignored spike times — 
or better: pattern jitter the entire recording — so that patterns extending outside of the trial boundaries are 
still preserved. The differences are negligible in this dataset using the synchrony test statistic. We ignored 
spikes outside of the first second so that the dataset would remain in the same format as our simulation 
experiments.) 

4.1 Beyond synchrony: Other measures of precision 
Figure 7 

Maximum repeating triplets test statistic. Let 11 < • • • < Iat be the sequence of spike times discretized to 
l/30th ms and measured in units of milliseconds, so that round(y£ — 1^) is the closest integer number of 
milliseconds that elapse between the fcth and £th spike. For each i,_7 G {0, 1, ... , 1000}, we first compute 
the number H{i,j) defined as 

7V-2 7V-1 N 

H{hj) = Y. E E l{round(y^ - n) = i}l{round(y„ - y^) = j} 

k=l i=k+l m=e+l 

X l{yfc, Y£, Ym all occur during the same successfully completely trial} 

which is the total number of three (not necessarily consecutive) increasing spike times such that the sequence 
of two intervals between the spike times (rounded to the nearest millisecond) is (i, j) ms and such that all 
of the spike times come from the same successfully completed trial. For computational simplicity, we only 
consider intervals up to 1000 ms. Each (i, j) pair defines a "triplet". H{i,j) counts how many times triplet 
(i, j) repeats in the data. (The data had an absolute refractory period of more than 1/2 ms, so H{i,j) = 
whenever i = 0ot j = 0.) 

The test statistic used in Figure 7 is the maximum number of repeating triplets, namely, 

max H(i,j) 
ije{o,i...,iooo} 

Notice that pattern jitter with R> d + 0.5 milliseconds exactly preserves the value of H{i, j) for i, j < d. 

Pattern jitter We generated surrogates from various pattern jitter null hypotheses with differing com- 
binations of R (history length) and 5 (jitter window width) as before. We conditioned on all spike times 
that were not part of a successfully completed trial, meaning that we held these spike times constant and 
included them in the definition of a pattern (but they were not included in the test statistic). In fact, we also 
conditioned on the first and last spike time within in trial, to better ensure that there was no unexpected in- 
teraction between our choice of trial boundaries and the test statistic. It turns out that none of the additional 
conditioning matters in the experiments here, but we feel that it is always good practice to err on the side of 
caution, especially when using complicated test statistics like the maximum number of repeating triplets. 

One can, of course, over-condition to the point that the surrogates do not have enough variability to 
draw meaningful conclusions (loss of power). For our dataset the maximum value of H was achieved with 
(i, j) = (23, 27), so that R > 27.5 ensures that max if can only increase with jittering. We can thus never 
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reject the null with R > 27.5 using this test statistic (and right-sided rejection regions). Hence, choosing 
R > 27.5 certainly results in over-conditioning, and the problem likely begins for somewhat smaller R, as 
well. 



4.2 Temporal resolution and the neural representation of behavior 



Likelihood ratio tuning curves 



One motivation for using likelihood ratio tuning curves 



e{d) 



P{D = d,S = l) 



P{D = d\S = 1) _ P{S = 1\D = d) 
p{D = d) ~ P(5 = 1) 



P{D = d)P{S = 1) 



is the observation that 6{d) is proportional to the more familiar tuning curve P{S = 1\D = d) through 
multiplication by P{S = 1). Multiplication by P{S = 1) conflates synchrony, per say, with directional 
tuning, and therefore complicates the interpretation of jitter or other resampling methods that might be 
used to calibrate the relationship between synchrony and direction, or, more generally, between fine- 
temporal events and other measures of behavior. 

A second way to look at the role of likelihood ratios is to think of directional tuning as a collection 
of binary classification problems, one such problem for every direction d. Fix a single direction d and 
consider the problem of guessing whether or not a certain event has occurred given an observed movement 
in that direction. The "event" could be a spike in a particular neuron or the occurrence of a synchrony 
across two neurons, at about 100 ms (say) before the observed movement. Formally, the problem is to 
guess between 5 = 1 and 5 = when the event of interest is a synchrony of spikes. In both cases, 
the evidence is the movement direction, D = d. These are binary classification problems and by the 
Neyman-Pearson lemma ||96l |95l the optimal classifier, in terms of the best tradeoff between sensitivity 
and specificity, is achieved by comparing the likelihood ratio, 6{d), to a threshold. A particular threshold 
fixes a particular sensitivity/specificity pair; the set of all thresholds defines the optimal ROC ("receiver 
operating characteristic") curve. 

For the reader more familiar with classification based on the ratio P{D = d\S = 1)/P{D = d\S = 
0), we note that 



Data preprocessing. Time was discretized to 1 ms bins. The (x, y) position of the hand (cursor) was 
smoothed with a Gaussian kernel using a bandwidth (standard deviation) of 25 ms and also shifted back in 
time 100 ms. The offset of 100 ms was chosen to approximate the delay between motor cortex and observed 
kinematics. We will use {x{t),y{t)) to denote this smoothed and shifted position vector at time bin t. The 
movement direction D{t) at time bin t was the angle of the vector {x{t + 1) — x{t — l),y{t + 1) — y{t — 1)). 

For each neuron i = 1, 2, 3, let i, . . . , li,Ar. denote the spike times of neuron i. For each pair of 
neurons i, j, we use Sij{t) to denote the zero-one process of spike or no spike from neuron i in time bin t 
that happens to be within 1 ms of a spike from neuron j, namely, 

5jj(t) = G time bin t and \ Yi^£ — Yj^k\ < 1 ms for some £ = 1, . . . , Ni and k = 1, . . . , Nj^ 

When jittering, we jitter spikes before the 1 ms discretization step (i.e., using the original l/30th ms 
time bins), and then re-compute the 5 processes (which are discretized at 1 ms bins). 




where 93 is the monotone function ip{x) = x/ {P{S = I 
— change the threshold and get the identical classifier. 



0) + P{S = l)x). Hence the ratios are equivalent 



Figure 8 
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Estimating the likelihood ratio tuning curves. We abuse notation and use P[D = d), which typically 
means the probability that D equals d, when, in fact, we mean the probability density of D at d. The 
same comment applies for related quantities, like P{D = d\S = 1). Note however, that S is discrete, so 
P{S = 1) and related quantities are actual probabilities. 

We estimate the density P{D = d) using the list of values 

V = [D{t) : t is within a successfully completed trial) 

We use kernel density estimation applied to the list of values in V with a Gaussian kernel with bandwidth 
cr = 0.1 radians (which is approximately 6 degrees) that wraps around the interval [— 7r,7r). Wrapping 
enforces the natural periodicity in P{D = d) and prevents unnatural tapering for d near ibTr. The formula 
for our estimator is 

1 °° 1 

PiD = d)=0\Yl Yl -<P{id-u + 27ri)/a)l{de[-n,7r)} 

where (j)(z) = exp(— 2:^/2)/\/27r is the standard normal density function. This may look complicated, but 
except for d near itTr, it is virtually identical to the standard Gaussian kernel density estimator 



l.\2y{{d-u)/a)t{de[-n,n)} 



' ' Me© 

In fact, since our bandwidth is small, only I = —1, 0, 1 contribute to the sum in the formula for P{D = d), 
and a simple trick to compute P{D = d) is to use standard Gaussian kernel density estimation with the 
dataset P U (P — 27r) U (P + 27r) (and then multiply the resulting estimator by 3). Note that P is a list of 
values, meaning that it can have repeat values. 

We estimate P{D = d\Sij = 1) using the list of values 

Pjj = (-D(t) : t is within a successfully completed trial and Sij{t) = l) 

again with Gaussian kernel density estimation with bandwidth 0.1 radians and with wrapping around 
[-7r,7r). 

Our final estimates of the likelihood ratio tuning curves are the ratios of these kernel density estimators, 
namely 

P{D = d\Si,, = 1) 



i,jid) 



P{D = d) 



Figure 8 shows 0i,2, ^1,3, and 6'2,3, respectively, from left to right. 

Pattern jitter. We use the identical pattern jitter procedure from Figure 7, but we independently jitter each 
of the three spike trains, not just N2. For each jittered dataset, we compute new likelihood ratio estimators 
Oi^j according to the identical procedure. These estimators can potentially change a lot, because jittering can 
completely change which spikes are synchronous. We created 1000 independent jittered datasets leading to 
■ ■ ■ , (^if^^ where 6^^ comes from the original dataset and 6fj comes from the fcth jitter-surrogate 

dataset. 

Acceptance hands and p-values. For the purposes of a test- statistic for hypothesis testing or for the 
purposes of constructing acceptance bands, there is no fundamental distinction between a CCH, say c(r), 
and an estimator of a tuning curve, say 9{d). We constructed acceptance bands for 9 exactly as for the 
CCH, with one difference. For the simultaneous acceptance bands, we first transformed to a logarithmic 
scale, namely, log 9, to remove some of the inherent asymmetry between large and small values of the like- 
lihood ratio (which makes the standardization using u and s for the simultaneous bands behave better). We 
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computed simultaneous acceptance bands for log 9 and then transformed the bands back to the appropriate 
scale with an exponential operation. Also, for Figure 8, we do not apply any "correction" (i.e., subtracting 
the mean of the surrogates to aid visualization) because we want to emphasize the the tuning curves vary 
strongly with direction. 

The pattern jitter null hypothesis was not rejected using the simultaneous test for any of the three 
neuron pairs. Although the estimated tuning curves do, in fact, exceed the pointwise acceptance bands 
at a few directions, we had no way to identify these directions a priori before seeing the data, so the 
simultaneous test is more appropriate. (The situation is somewhat different for CCHs, because lag-0, 
corresponding to precise synchrony, can be identified as special a priori, and pointwise rejection at lag- 
is better justified.) Even under the null hypothesis there should be some excursions outside of the 
pointwise bands (which we see, as expected), because each corresponds to a separate test of the same 
null hypothesis. For the simultaneous bands, however, under the null hypothesis, the probability that 
the entire tuning curve falls within the light-grey bands is at least 0.95. We do not expect to see any 
excursions outside of these bands. 

5 Summary and Discussion 

Additional references can be found in the references section of this Supplement. The list is not meant to be 
exhaustive. 
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Supplementary References 



Supplementary references are broken into categories in the following order: On trial-to-trial variability; 
On jitter: methodology; On jitter: applications to neurophysiology; On spike timing; Statistical analysis 
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Mathematical Appendix 



In the main text, we illustrated the idea and purpose of jitter largely through a collection of spike-resampling 
algorithms, and for pedagogical reasons we connected the resampling algorithms to statistical concepts 
in words. In parallel, statistical methodologies, if properly formulated, can be used as tools for making 
mathematically-precise, probabilistic, statements. Here we provide this foundational view by placing those 
resampling algorithms used in the main text in correspondence with the statistical models, and null hypothe- 
ses, with which they are associated. 

In statistical terms, the approach may broadly be understood as an instance of conditional inference. 
Null hypotheses are specified via conditional distributions on the data. Classes of conditional distributions 
express, as null hypotheses, mathematical models of various scientific questions concerning spike timing. 
The null hypotheses justify the resampling algorithms and related methods of inference. 

The key ideas are present in the simplest case: discrete time, single neuron, uniform interval jitter. 
We begin with a brief description of that special case, and then proceed with a systematic development of 
underlying mathematical principles. 

A finely discretized spike train can be represented as a binary vector, say x = {xi,. . . ,Xm), where 

= 1 if there is a spike in time bin i, and Xi = otherwise. Fix a jitter window width, say A time bins, 
and let Nk{x) be the number of spikes in the time bins {{k — 1)A -|- 1, . . . , A;A}, that is 

kA 

i={k-l)A+l 

where we use the convention that = for i > m. The sequence of spike counts is AT" (a;) = 
(A'^i(a;), . . . , Nr{x)). The test statistic is any function T that assigns a number to each spike train. If 
X = {Xi, . . . , Xm) is a random spike train, then the uniform interval jitter null hypothesis is 

Ho The distribution of X satisfies 

Pr(X = x\NiX) = n)) = ^^l{N{x) = n} 

for all X and n, where Z{n) = J2x '^{^{x) = n} is a normalization constant. In other words, given 
N{X) the conditional disttibution of X is uniform. 

Each jitter window width, A, gives a new null hypothesis (via a new definition of the sequence of spike 
counts, N); larger values of A impose more severe restrictions on temporal resolution. 

Hq is a well-defined null hypothesis: every possible distribution on m-length binary vectors either satis- 
fies Hq or it does not. Testing this null hypothesis is elementary: the uniform interval jitter procedure gives 
exact p-values for any choice of test statistic. There are essentially no remaining statistical or mathematical 
decisions. There are, however, crucial scientific decisions. 

• What is the choice of A? Or is A a parameter to be explored? 

• What is an appropriate T? 

• Are Hq and T an appropriate abstraction of the scientific question? 

The scientist must be able to answer these questions in order to use jitter appropriately. Relating these ques- 
tions to the typical neuroscientific uses of jitter is the focus of the main text. The focus of the Mathematical 
Appendix is to elaborate on some of the mathematical details and generalizations. 
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A Resampling procedures and statistical hypothesis testing 



The Appendix is organized as follows. In Section A.l we review standard definitions and properties of 
hypothesis testing and of exchangeable random variables. In Section A.2 we provide a formal description 
of the resampling algorithm for uniform interval jitter and relate the algorithm to an exact hypothesis test. 
A key mathematical justification is that, under the null hypothesis, the original and resampled data sets are 
exchangeable random variables. In Section A.3 we re-present basic jitter and point out that in the case of 
basic jitter the exchangeability condition is violated. For this reason, we advocate the view of basic jitter 
as an approximate, or heuristic, variation on interval jitter. In Section A.4[ we present pattern jitter as a 
generalization of uniform interval jitter which incorporates constraints on interspike intervals, and state the 
corresponding null hypothesis. The justification of the resulting hypothesis test is similar to that of interval 
jitter. In Section A.5[ we move beyond uniform re-sampling. Here we describe the null hypothesis for tilted 
jitter, whose chief characteristic is that the conditional distribution of a spike's position within a window 
is non-uniform (in contrast to uniform interval jitter). Here we develop variations of tilted jitter in which 
inference is developed for the special case of analyzing spike synchrony. Some additional mathematical 
details associated with tilted jitter are contained in Section |B] Finally, Section A.6 develops deterministic 
(non-Monte Carlo) variations on some of the tests previously described. 



A.l Preliminaries on hypothesis testing and exchangeability 
A. 1.1 Statistical Hypothesis Testing 

Given a data set X modelled as a random variable, a null hypothesis Hq is a class of probability distributions 
on X. A hypothesis test is a test of the hypothesis that the distribution governing X is contained in Hq. 
Formally, we will express hypothesis tests in terms of p-values with respect to an explicit null hypothesis 
Hq. a statistic p{X) is a p-value for a null hypothesis Hq if it has the property that 



Pt{p{X) < u) < u,y < u < I, (3) 

under the condition that the distribution Pr is in Hq. By custom, p- values are reported when hypothesis tests 
are used. They implicitly determine critical regions: define 

C{a) = {X : p{X) < a}. (4) 

By definition, if p is a p-value (satisfying Q) for Hq, then, under Hq, 

Pi{X e C{a)) < a, (5) 

so that C{a) is a critical region for an a-level test of Hq. This is the standard definition of a test. We note 
that, in what follows, p will sometimes depend not only on the observed data X, but also on Monte Carlo 
resamples, say, X^^\ . . . , X^'^\ In this case, we require that Pr(p(X, X^^\ . . . , X^"^^) < u) < u for 
all u G [0, 1] whenever Hq is true, where the probability is with respect to the joint distribution on X and 
the Monte Carlo resamples. The critical region is similarly modified. When the number of Monte Carlo 
samples, J, is large, the additional randomness introduced by using a Monte Carlo procedure is negligible. 
We note that the Monte Carlo randomness only affects power: rejecting Hq when p < a correctly bounds 
the type I error probability below a for any choice of J. 

The set-complement of the critical region is the acceptance region. Acceptance regions for single-tests 
are in correspondence to what we refer to as pointwise acceptance bands when multiple tests arise from an 
indexed family of tests and are plotted graphically. Simultaneous acceptance bands are developed in 2. 1 of 
this Supplement, and discussed elsewhere in the Supplement and the main text. 
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A. 1.2 Exchangeability 



A finite collection of random variables Yi,Y2, ...,Yn is exchangeable if its joint distribution is invariant to 
permutations of its arguments. That is, 

Pr((yi,...,y„) G^) =Pr((y,(i),...,y,(„)) G^) (6) 

for all set^^ and all permutations vr of the index set (1, n)|^ 

A recurring idea in the appendix is that, under a suitably-formulated null hypothesis, the collection of test 
statistics derived from the original spike data sets and the resampled surrogates are exchangeable. Therefore, 
any test of the exchangeability of the collection of test statistics will be a test of that null hypothesis. The 
basic hypothesis test is the following test of exchangeability. (Below we use the symbol 1 to denote the 
indicator function]^ 

Lemma A.l. Suppose that Yi, .., 1^ are real-valued random variables. The statistic 

n 

is a p-valuefor the null hypothesis that Yi, ...,Yn are exchangeable^ 

Independent and identically distributed (i.i.d.) random variables are exchangeable. We will also make use 
of the fact that conditionally independent and identically distributed random variables are exchangeable. 

Lemma A.2. IfYi,...,Yn are conditionally i.i.d. (or, more generally, conditionally exchangeable), given a 
random variable Z, then Yi, ...,Yn are exchangeable^ 



A.2 Uniform Interval Jitter 

We begin by modelling a single discretized spike train as an m-length Bernoulli process: a discrete time 
zero-one process with outcomes in {0, 1}™. A sample outcome is of the form x := (xi, ...,Xm) G {0, l}™, 
where Xj indicates spike or no spike in time bin j. In interval jitter, the time axis is partitioned into k 
equally-sized intervals consisting of A bins eachj^Fj := {(i — 1)A + 1, zA} denotes the set of time bins 
of the i'th window. 

'To avoid cluttering the exposition, we leave any measure-theoretic qualifications as implicit. In any case, nothing unusual 
comes up. Where spike times are modelled in a continuum, intervals and spike counts in intervals are, naturally, taken as finite. 
Wherever they appear, densities and conditional densities / are for spike times, and are used in the sense of elementary probability. 

permutation of an ordered set is a rearrangement of the set. For example, tv = (4, 3, 1, 2) is a permutation of (1, 2, 3, 4). 
Here we are writing 7r(l) = 4, 7r(2) = 3, etc. 

^1{j4} denotes the indicator function of the set A, which takes the value 1 when the sample outcome falls in A, and otherwise. 
In particular, Y^,"^^ l{Yi > Vi} = #{i : > Vi, 1 < i < n} 

■*For completeness, here is a proof of this well known fact. Exchangeability implies that Pr(p < a) = Pr(X]r=i ^{Yi > 
Yk} < na) for each k and therefore Pr(p < a) = n"^ Pr(Er=i ^{Yi > Y^} < na) = n^^ E [J^I^-l 1 {EILi HYi > 

Yk} < na}] < 'E[na] — a, where E denotes expectation. (The steps are easiest to verify using the order statistics, which are 
the result of a permutation o(l), o(2), . . . , o{n) such that ^0(1) < ^0(2) < ■ • ■ < Yo(n) ) 

Again, here is a proof of this well known fact. Pr((yi, . . . , y„) £ A) ^ E[Pr((yi, . . . , y„) G = 
E [Pr((Y'^(i) , . . . , yir(„)) G ^1^)] ~ Pr((y7r(i), • • • , Y^(^„)) G where the middle equality follows from the assumption 
that, given Z, the F's are exchangeable. 

*In neurophysiological experiments described in the accompanying article, spike trains were discretized into bins of 1/30 ms. 
When the observation interval is not divisible by A, there will be a final interval of size less than A. To keep things simple here, 
we will treat any spikes in that interval as frozen. Formally, we are additionally conditioning on the location of spikes in the final 
interval. But for the presentation here we will ignore this detail. 
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Define the sequence of spike counts N{x) = {Ni{x), N2{x), ...,Ni{x)) as tlie interval-counts: 

N,{x) ■.= Y,Xj (8) 

Finally, let T(a;), the test statistic, be a fixed function that assigns a real number to a spike train (e.g. the 
number of synchronies relative to a second, "reference," neuron). 

Let X := {Xi, ...,Xm) be a random spike train drawn from an unknown probability distribution, de- 
termined by the experimental setup. A surrogate data set X*-*^ := {x[^\ X2 \ ...,Xm) is produced by 
sampling from the (conditional) distribution: 

P(X» = x\N{X) = n) = ^ = , (9) 

That is, X^*^ is drawn independently and uniformly from the subset of spike train outcomes (i.e., of {0, l}™) 
that satisfy N{X^^^) = N{X). Iterating this procedure J times, we obtain J (conditionally) independent 
samples X^^\ X^'^\ X^'^\ drawn from the distribution The computational procedure for sampling 
from (j9]) is straightforward: proceeding independently window -by-window, assign Ni{X) spikes to window 
i by sampling from the uniform distribution on the {j^^x)) possible assignments. 
Finally, we define 

= i + E.i.iCTx"))>T(x))^ 

Proposition A.3. (Uniform Interval Jitter) 

Define Hq as 

Hq : the conditional distribution of X given N{X) is uniform, meaning that for all x and n: 

Pr(X^x|iV(X) = n)= "^'r';"' „y dO 
Then, p{X, X^^\ X^'^\ X^"^^) is a p-valuefor HQ,for all choices of the statistic T. 



Proof. By construction and under Hq, X, X^^\ X^"^^ are conditionally i.i.d., given N{X). It follows 
that r(X),r(X(i)), ...,T(X("')) are conditionally i.i.d., given N{X), and regardless of the choice of test 
statistic T. Therefore, the proposition follows from Lemmas A.l| and p^.2| □ 

The scientific motivation for the null hypothesis is treated in the main article. The following examples 
help to clarify its scope. 

Example (The case A=l.) In the case A=l, Hq contains every spiking process. Thus, if we view the full 
hypothesis space for a 1 ms discretization as a nested hierarchy - for example, 

{A = 100 ms} C {A = 10 ms} C {A = 1 ms}, (12) 

- then the complete hierarchy contains every probability distribution on X. This is one of the senses in 
which the method is nonparametric. 
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Example (Inhomogeneous Bernoulli Processes.) Xi, Xm is a Bernoulli process with parameters 
...,p{m) where Pr(Xj = 1) = p{i). The Bernoulli process is the discrete analogue of an inhomoge- 
neous Poisson process, with p{t) playing the role of the rate function. Inhomogeneous Bernoulli processes 
are included in the null hypothesis when the rate function p{t) is constant along the intervals; i.e., when it 
takes the form 

p{t) = Y,c,\{t^T,]. (13) 
The rate function p{t) can itself be random. Thus the null hypothesis includes random inhomogeneous 



Bernoulli processes which are drawn randomly from a distribution of rate functions satisfying ( 13l. (The 
latter class corresponds to the case that the c/s are real- valued random variables with any joint distribution). 

Example (Deterministic A^(X).) Hq contains the case in which the experimental conditions exactly deter- 
mine the counts in the A intervals, but the spike times are otherwise uniform. Thus in this case, if one took 
Hq as an estimation model, the maximum likelihood estimate for Pr(A^(X)) is the point distribution which 
produces the observed sequence of interval-counts with probability one. 

We note one final alternative characterization of the null. Loosely speaking, one way to motivate the per- 
spective here is to interpret N{X) as containing the information about spiking structure at time scale A. 
Then, the null hypothesis is that all of the probabilistic structure in the spike train is contained in N{X). 
One formalization of that idea is through the following proposition. 

Proposition A.4. (Probabilistic-invariance) 

The following conditions are equivalent: 

i) Pr belongs to Hq. (i.e., Hq is true) 

ii) there exists a function h such that Pr(JC = x) = h(N{x)). 



Proof. If (i) is true, then (ii) follows from the definition of Hq in Proposition A.2 If (ii) is true, then clearly 
Pr(X = x) is constant on {x : N{x) = n} for each n. So Pi{X = x\N{X) = n) is uniform and (i) 
holds. □ 



A.2.1 Multiple Spike Trains 

These methods generalize to any finite collection of q spike trains X = {Xi, X2, Xq], un- 
der the same setup as the previous section. The conditioning random variable is now N{X) := 
{N{Xi);N{X2):,...\N{Xq)}, and the test statistic is any function of the collection T{X). Surro- 
gate spike trains are formed as before, independently for each spike train, to form surrogate data set 
= {Xf,xf,..., Xf}. The p-value is 

X"). X<^). .... X'^') = ' + ^ ^'^» . (14) 

The statement and justification of the hypothesis test are, using the multiple spike train notation, the same 
as in the previous section. Generalizations to multiple spike trains of other versions of jitter (below) are 
equally straightforward, and will not be re-stated for each case. 



A.2.2 Continuous Time 

Alternatively, one can set up jitter in the continuum. We represent the spike times t := (ti, ^2, ■■■tR) where 
K is the total number of spikes in the spike train. Then let ilfc := [ak, ajt + A] be the jitter window associated 
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with spike k, specified by at = [t^/AJ A, wliere we denote [xj , the. floor of x, as the largest integer less 



than or equal to x. To form surrogate data set t^^\ we assign independently to each t^*^ a sample from the 



uniform distribution on Qk- So t\'',t2', ■■■,t^X' conditionally independent, given r2i,il2i ■■■,^k, with 
conditional density 

Jk{x) = (15) 

In the continuum representation, we will denote the interval counts as N{t). (Note that we are "overloading" 
the functions N, T, and p, by borrowing the notation from the discrete setup.) There is an invertible mapping 
from {^k}k=i to ^(*) modulo permutations, so they determine an equivalent conditioning. Given a test 
statistic T(t), the p- value and the test are essentially the same as in the discrete case: 

p(M'".^'-'....,t'^')= ^^^--'^f^'f' (.6) 

Proposition A.5. (Continuous Uniform Interval Jitter) 

Define Hq as 

Hq : the conditional distribution oft given N{t) is uniform, meaning that its conditional density given 
N{t) is: 

1 ^ 

fiih, ...,tk)\N{t)) = ^ n ^i*^' ^ ^1^) 

k=i 

Then, p{t, t^^\t^'^\ t^^^^) is a p-value for HQ,for all choices of the statistic T. 

Proof. Under HQ,t, t^^\ ...,t^-^^ are conditionally i.i.d., given N{t), so the logic is the same as in the dis- 
crete case. □ 

Example (Poisson and Cox Processes) Analogous to the inhomogeneous Bernoulli process in the discrete 
setup, the null includes Poisson processes with rate X{t) of the form 

Included as well are Poisson processes with stochastic rate functions ("Cox processes"), in which the rate 



function is drawn randomly from a distribution of rate functions of the form of (18 1; that is, the Cj's are 
random variables with any joint distribution. Intuitively, one can think about this as a test for the class of 
processes well-approximated by Cox processes whose rates are piecewise-constant in the interval partitions. 
This is a notion of a "Cox process varying at timescale A." (Note, however, that the null is far broader 
than the class of Cox processes because no constraints are placed on the distribution of spike counts in the 
A— length intervals.) 

A.3 Basic Jitter 

Basic jitter is similar to continuous interval jitter, except the jitter windows are centered around the spikes in 



the original data set. Following the development of Section A.2.2 take instead = — A/2. With respect 
to the resampling scheme, everything else is the same. However, in this scheme, t,t^^\ ...,t^'^^ are not 
exchangeable regardles^of the distribution Pr(f). For this reason, an analogue of Proposition A.2 



cannot 

hold for arbitrary T, and we recommend thinking of basic jitter as a heuristic procedure, and with some 
caution, particularly for complex choices of the statistic T. Interval jitter can be motivated as one possible 
way to set up a spike timing null hypothesis under which a specifiable resampling procedure preserves 
exchangeability. 

'Excluding the deterministic zero-spike distribution. 
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A.4 Pattern Jitter 



Some forms of fine temporal structure may be deemed uninteresting, and if the test statistic is sensitive to that 
form of structure, a jitter rejection then might be consequently uninteresting. The generic examples in the 
spike-resampling literature fS] are the local interspike interval constraints, such as bursts and the refractory 
periods. These phenomena can impose hard or soft constraints on local interspike intervals. Pattern jitter 
extends the elementary spike jitter procedures so as to separate out the influence of such constraints. Here 
we follow the development of ||T9ll . 

Suppose an individual spike train t := {ti,t2, t/^), is represented as an increasing sequence of spike 
times, t is broken down into the patterns which compose it. A pattern consists of a subsegment of the 
spike train for which all neighboring spikes {ti, tj+i) have interspike interval tj+i — tj < R, and which is 
separated from all spikes not in the pattern by a distance greater than R. The parameter R thus encodes the 
length of history constraint which spike times must respect to preserve patterns. Under the constraint that 
the identity of all patterns thus defined is preserved, patterns are rigidly and uniformly jittered by jittering 
the pattern's first (earliest) spike uniformly within its jittering window. (The jitter windows can be chosen 
as in interval or basic jitter; the issues are the same: the two choices lead to procedures that are often close 
approximations of each other, but when windows are chosen as in interval jitter there is a corresponding 
exact hypothesis test.) We use = [o-k, o-k + A] to denote the jitter windows in what follows. 

To formalize the idea we specify a (vector-valued) statistic S{t) that determines the component patterns. 

S^l{t) :={ , . (19) 

\ti — otherwise 

SMt):=ll ■f*--*->^ ,20, 
I otherwise 

In words, given a spike train of K spikes, Sk2 determines whether spike k is the first spike in a pattern, or 
not. If A; is a first spike. Ski determines the (starting point of the) A— interval that contains the first spike, 
otherwise Ski specifies the interspike interval between spike k and the neighboring spike that precedes it, 
in its pattern. Thus S{t) determines the pattern-composition of a spike train, and the jitter window which 
contains the first spike in each pattern. Pattern jitter resamples are drawn uniformly from the set of spike 
outcomes consistent with S{t). That is, t^^), t^^), t^'^^ are conditionally independent given S{t) and 
drawn from 

Pr(t«|5(t)) = J^^HSit^'^) = S{t)}, (21) 

where Z{S{t)) is the normalizing constant determined by the constraint that the conditional distribution sum 
to one. The computational problem of sampling from this distribution is non-trivial; an efficient procedure 
is developed in [19] . 

If we choose the jitter windows as in interval jitter, then the function p (e.g., Eq. [16]) is a p— value 
under the null hypothesis that the conditional distribution on t, given S{t) is uniform. The reasoning, via 
exchangeability, is just as in interval jitter. The procedure can be set up either in discrete or continuous time. 



A.5 Tilted Jitter 

Fix A, and consider now the continuous time representation of the spike train t = {ti,t2, ...,tK), with 
corresponding intervals Oi, with Qk = [ofc, o-k + A]. In the conditional inference perspective 

developed above, a null hypothesis is specified through the conditional uniform distribution on the spike train 
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given N{t). For example, under the continuous interval jitter null hypothesis, ti,t2, ■■■■.tx are conditionally 
independent given N{t), and the conditional density of given N{t) is taken as uniform: 

t{x £ ^k} 

Jk{x) = rp^-^ . (22) 

A natural way to parametrically relax the hypothesis of uniformity is to broaden the uniform conditional 
uniform density fk (x) to a class of smooth density functions whose variation is restricted by a parametric 
bound. Here we quantify variation for a density / by the functional a(/): 

max^g^o^A] /(x) 
mina:e[o,A] f{x) 

and define a parametric class of densities by 

TiS,A,e) = {fGS:a{f)<e}, (24) 

where S is a specified class of smooth (for example, linear) densities on [0, A]. 

We will restrict ourselves to synchrony analysis in what follows. Let us denote a simultaneously recorded 
spike train s := (si, S2, sKs)- We will treat s as fixed. Defining 



C := < X : min Is, — x\ < 6 > , (25) 

I l<j<Ks J 

our synchrony statistic is 

K 

v{t):=Y,HtkeC}, (26) 

k=l 

which is the number of spikes in t that are within ±6 of some spike in s. 
Now we will specify a resamphng procedure. For each k, we compute 

/fc G argmax / f{x-ak)dx. (27) 

(In the case that the argmax contains multiple elements it does not matter which density we choose). 
Methods of computing for various classes of smooth densities S, are provided in Section |B] To form 
surrogate data set t^*), we then assign independently to each t^*^ a sample from the density The function 
p is defined as 

p(M«,t« t(^))^'+^i'-.';'"""'S""». (28) 

t/ ~t~ 1 

Proposition A.6. (Tilted Jitter Synclirony Test) 

Define Hq as 

Hq : under the conditional distribution oft given N{t), ti, ...,tK are conditionally independent with 
conditional densities fk satisfying: 

fk{xk-ak\N{t))eT{S,A,e) y k e {1,2, ...K} (29) 

Then, pit, t^^\t'^^\ t^-^)) is a p-valuefor Hq. 
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Proof. In this case the conditional null hypothesis is "compound" (containing many distributions), so ex- 
changeability does not necessarily hold. Instead, we are drawing the surrogates from the "worst-case" 
distribution in the conditional null. The plan (for the proof) is to randomly perturb the spike times in each 
to generate a new surrogate rW in such a way that v(t), ?;(r(^)), z;(r(^)), i;(r('^)) are conditionally 
i.i.d. given N{t), and hence exchangeable, and furthermore 

z;(r«) < Vi G {1, 2, J}. (30) 

Since by exchangeability r^^^ , r^^^ , r^"^)) is a value (under Hq), and since by equation ([30]) 

p(t, rW, r(2), r^J)) < p(t, t«, t^^), (31) 

the proposition follows. 

It remains to show how to generate r^*) from t^^\ for each i = 1, 2, . . . , J. Fix a distribution Pr S 
Hq. Then by assumption (i.e. by Hq), {t{tk £ C}}|^^ are conditionally independent Bernoulli random 
variables (r.v.'s) given N{t), with parameters := Pr(ifc = C\N{t)). Similarly, G C}}^^^ are 

conditionally independent Bernoulli r.v.'s, given A^(t), with respective parameters := Pr(t^*^ G C\N{t)). 
By construction, pk < p^. Now, for every i = 1, 2, . . . , J and every k = 1,2, . . . , K, independently 

(i) 

generate r^. by the (random) prescription 



J') 



t« iff 

t'j^^ with probability p^/p* if g ^ , (32) 
^ Uk otherwise 



where is an arbitrarily chosen point in Qk H C*^ (the distribution does not matter). By con- 
struction, Vi = 1,2, J, {l{r[.*^ G C}}^^^ are conditionally independent Bernoulli r.v.'s, given 
N{t), with parameters pi,p2, ■..,pk, and r^^), r^^), r^'^) are conditionally i.i.d. given N{t). Hence 
v{t),v{r^^^),v{r^'^'^), f (r^"^)) are conditionally i.i.d given N{t). Furthermore, ^;(rW) < u(tW),Vi G 
{1, 2, J} by definition. 

□ 

A.6 Deterministic tests 

In several cases deterministic (non-stochastic) versions of the tests developed above are available. To give 
the idea, we will develop a single example here for synchrony-count test statistics, applied to a continuous 
uniform interval jitter test for a single spike train t. (A deterministic test can be developed for the tilted jitter 



null hypothesis in an analogous way.) The setup is as in tilted jitter (Section A.5 1, in which we condition on 
the spike positions s of a simultaneously-observed spike train, and define 



C := < X : min Is, — x\ < 6 > (33) 

i i<j<Ks ' ^ J 

through which our synchrony statistic is 

K 

v{t):=Y,l{tkeC}, (34) 

k=l 

which is the number of spikes in t that are within ±5 of some spike in s. Define in addition 

Vkit) := l{tk G C}, (35) 
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so that v{t) = X]a;=i '^k{i)- Under Ho,vi{t), V2{t), ...,VK{t) are conditionally independent Bernoulli r.v.'s, 
given N(t), with parameters pk{N'{t)) := P{vk(t) = l|A^(t)) = |C H Qk\- In this case, the conditional 
distribution of v{t) can be computed exactly (e.g., by convolution). It has the form 

G(c; N{t)) := Fi{v{t) > c\N{t)) := Pr(Ef=i ^ > c), (36) 

where 11,12, ■■■Y'k are independent Bernoulli random variables with parameters {pfe(Ar(t))}|£^. Once G 
is computed, the p- value is 

G{vity,Nit)) (37) 



B Obtaining tlie optimization solution /* 

Tilted jitter requires finding an extremal spike-relocation distribution, constrained by the particular null 
hypothesis being tested (see Section A.5, and Section 3.1 of the main text as well as its elaboration in this 
Supplement). Here we work through the calculation in a few examples. 

Fix R C [0, 1] and let / denote a generic probability density function (pdf) on [0, 1]. We want to 
maximize / subject to the constraint that a(/) < e for some fixed e > 0, where 

maxo<a;<i f{x) 
mino<a;<i J{x) 

and perhaps also subject to additional smoothness constraints, like linearity of /. Define := {/ : a{f) < 
e} and let S be the set of pdfs that satisfy the smoothness constraints. Then the problem of interest is to find 
any 

/* G argmax / f{x)dx. 
feM^nS Jr 

We always assume that the uniform (constant) pdf satisfies the smoothness constraints so that n 5* 7^ 0. 
If dx € {0, 1}, then any / is an arg max and in practice we take /* to be the uniform pdf. Henceforth, 
we assume that 

< \R\ ■= [ dx< 1. 

Jr 

Note that / is linear in / and that Mg is closed and convex. If S is also closed and convex, then 
n 5 is closed and convex and a maximizer /* exists and can be found on the extremal points of Mg n S. 

B.l No smoothness constraints 

If S := {all pdfs on [0, 1]} so that we are maximizing over Mg, then a solution is 

._ 1 + el{x e R} 
l + e\R\ ' 

where t{A} is the indicator function of the event A and where x G [0, 1]. 
Proof. It is easy to verify that a{f*) = e and that 

Now suppose that / is a pdf with Jjif > P- This gives 

ff N ^ !Rf{x)dx (3 1 + e 

max /(a;) > — , > 7— = — - 

^ Ir^x \R\ l + e\R\ 
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and 



/^c f{x)dx 1-/3 
mmj(xj < — ^^-p — < 



fj^cdx 1-\R\ l + e\R\' 
Combining these gives a{f) > e and shows that /* achieves the maximum. □ 

B.2 Finite mixture smoothness constraints 

Let {gi, . . . , gm} G be fixed and known pdfsj^Suppose that S is the set of mixtures of the g^s, namely, 

{mm \ 
^PkQk ■^Pk = l^''^A each Pk>^\. 
k=l k=l ) 

Note that S C M^, so we are maximizing over S. Since S is closed and convex, we can take /* to be an 
extremal point. The extremal points of S are simply {^i, . . . , 5^}- To find /* we compute gu for each k 
and choose the one that gives the largest integral. 

B.3 Finite basis smoothness constraints 

Let {hi,...,hm\ be fixed and known functions on [0,1] that each integrate to 0. For any a := 
(ai, . . . , am) G IK™ define the function on [0, 1] by 

m 

fa{x) := 1 + '^akhk{x) 

k=l 

and let 



A:= {a^W^ : min f{x) > 

1^ 0<x<l 

so that fa is a valid pdf whenever a ^ A. Finally, define 

A,:={a£A: a{fa) < e} . 

Suppose that S := {fa a & A}. Notice that optimizing over n 5" is equivalent to optimizing over 
A^. The computational problem, then, is to find a maximizer of the function 

/ fa{x)dx= / dx + S^ak I hk{x)dx 
JR JR j^^i JR 

over a ^ A^. Defining H := {Hi, . . . , Hm) for := hk{x)dx, which are known, the computational 
problem reduces to finding 

a* € argmax a-H, 

where a ■ H denotes the dot product. Note that a • if is linear in a and A^, is closed and convex, so we can 
restrict attention to the extremal points of A^. 

There exist generic algorithms for solving convex optimization problems of this kind, although for spe- 
cific instances it can be computationally much quicker to use custom algorithms based on further analysis. 



'^If it is not known tliat eachi gk £ M^, then define — gi^ — 1 and use the methods in Section B.3 perhaps with the additional 
restriction that a is a probability vector. 
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B.3.1 Linear smoothness constraints 

A simple example is the case where m = 1 and hi (x) := 2a; — 1, so that S is the class of linear pdfs. Noting 
that 

/f\ ^ + 1^1 1 



we see that 



e + 2' e + 2 



so the extremal points of Af. are the two points — e/ (e + 2) and e/ (e + 2) . The values of a-H at these extremal 
points are —He/ (e + 2) and He/ (e + 2), respectively. Choosing the maximum and then the corresponding 
fa gives the solution 



fix) 



' 1 + (2x - 1) if Jj^{2x -l)dx> 0, 

6 "h 2 



{2x - 1) if Jj^{2x - l)dx < 0, 



e + 2 



otherwise 
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